Method and system for classifying a biological sample

ABSTRACT

The present invention relates to a method of training a classification system for characterising a biological sample, a diagnostic classification system, as well as a method of characterising a condition in an animal or a human being by using parameters obtained from the sample. The invention relates to classification based on physical parameters obtained from luminescence spectroscopy on light emitted from the sample. The data obtained from a spectrofluorimetric analysis can be considered a finger-print of the sample. Each sample gives rise to a unique spectrofluorometric set of physical parameters. By analysing the fluorescence data, it is possible to classify samples into two or more classes based on the fluorescence spectra, such as classifying with respect to presence/absence of a specific disease, group of diseases or risk of later attaining a specific disease or a body condition, or concentration of a specific compound or medicine.

[0001] The present invention relates to a method of training aclassification system for characterising a biological sample, adiagnostic classification system, as well as a method of characterisinga condition in an animal or a human being by using parameters obtainedfrom the sample.

BACKGROUND

[0002] A need for a fast and reliable primary diagnostic tool providinginformation indicative of a disease or a group of diseases has existedfor years.

[0003] In U.S. Pat. No. 4,755,684 (Leiner et al.) a method for tumordiagnosis by means of serum tests is disclosed. The method includesexcitation of the serum by an excitation radiation at least of awavelength between 250 nm and 300 nm, and its fluorescence intensity ismeasured at predetermined emission wavelengths. From deviations of thesemeasuring values, a conclusion may be drawn with respect to the presenceof a neoplastic disease. Measurements at one or two excitationwavelengths are suggested. Up to three emission wavelengths aredetermined for each excitation wavelength and an intensity value isdetermined. Since very little information from the fluorescencespectroscopy is used the diagnosis is very rough and insecure. Onlyabout 60% are diagnosed correctly and the diagnosis is limited to a yesor no.

[0004] In WO 96/30746 and WO 98/24369 fluorescence spectra are used toscreen tissue samples in situ, wherein the tissue suspected to bedysplastic tissue is directly subjected to fluorescence spectroscopy.The methods are used to distinguish between dysplastic cervical tissueand normal cervical tissue. In O'Brien K. M. et al “Development andevaluation of spectral classification algorithms for fluorescence guidedlaser angioplasty”, IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, vol.36, No. 4, April 1989, pages 424-4430, fluorescence spectroscopy is usedto distinguish normal arterial tissue from atherosclerotic tissue. Noneof these methods allows a specific diagnosis to be made based onanalysis of spectra from tissue or bodufluids not directly related tothe diseased tissue.

[0005] In U.S. Pat. No. 5,734,587 a method of analyzing sample liquidsby generating infrared spectra of dried samples and evaluating using amultivariate evaluation procedure is disclosed. In the evaluationprocedure the samples are assigned to classes. The evaluation procedureis trained with samples of known classes to adjust the parameters of theevaluation procedures, such that samples of unknown classification canbe assigned to known classes. The samples analysed are clinicallyrelevant liquid samples, that have to be dried before generating theinfrared spectra of the samples due to the nature of infrared spectra.

[0006] Most organic compounds absorb light in the visible or ultravioletpart of the electro-magnetic spectrum. Many molecules emit the absorbedexcitation energy in the form of fluorescence. A fluorescence spectrumis obtained by transmitting light to the sample (excitation light) anddetermining the spectral distribution of the light emitted from thesample. In the case where only one fluorescent compound is present in aweakly absorbing solution, the spectral profile of the fluorescence willbe invariant with respect to the excitation wavelength. Only theintensity of the fluorescence will vary with the wavelength of theexcitation light in accordance with the absorption spectrum.

[0007] If more than one fluorescent compound is present in the solutionthe relation between excitation and emission intensities will rapidlyincrease to a very high level of complexity. The individual compoundswill absorb differently for each excitation wavelength, the intensityand distribution of the fluorescence will vary with excitationwavelength, and reabsorption of emitted photons might occur.

[0008] When a series of fluorescence spectra using different excitationwavelengths are recorded, the spectra collected represents anemission-excitation-matrix (EEM), which can be displayed as a3-dimensional landscape (FIG. 1). The EEM is specific for the specificmixture of compounds and the conditions under which it is measured.

SUMMARY

[0009] It has been an object of the present invention to provide amethod capable of classifying samples with unknown properties in asystem not requiring any drying, enrichment, separation or concentrationof the sample before determining the class, to which the sample belongs.

[0010] This has been possible by subjecting the sample to fluorescencespectroscopy or a variant thereof, whereby liquid as well as solidsamples may be classified.

[0011] Thus, in a first aspect the present invention relates to a methodof training a classification system for characterising a biologicalsample with respect to at least one condition, comprising

[0012] a) obtaining a biological sample from an animal, induding ahuman, wherein said biological sample is selected from body fluidsand/or tissue, wherein the tissue sample is not associated with saidcondition(s),

[0013] b) obtaining characterisation information related to eachbiological sample,

[0014] c) exposing the sample to excitation light within a predeterminedrange of wavelength,

[0015] d) determining physical parameter(s) of light emitted from thesample,

[0016] e) repeating step a) to d) until the physical parameters of alltraining samples have been determined,

[0017] f) optionally performing a data handling of the obtained physicalparameters obtaining data variables,

[0018] g) optionally performing a multivariate data analysis of the datavariables obtaining model parameters describing the variation of thedata variables,

[0019] h) classifying the biological samples into at least two differentclasses correlated to the characterisation information, obtaining atrained classification system.

[0020] In a preferred embodiment the method comprises the steps of:

[0021] a) obtaining a biological sample from an animal, including ahuman, wherein said biological sample is selected from body fluidsand/or tissue, wherein the tissue sample is not associated with saidcondition(s),

[0022] b) obtaining characterisation information related to eachbiological sample,

[0023] c) exposing the sample to excitation light within a predeterminedrange of wavelength,

[0024] d) determining physical parameter(s) of light emitted from thesample,

[0025] e) repeating step a) to d) until the physical parameters of alltraining samples have been determined,

[0026] f) performing a data handling of the obtained physical parametersobtaining data variables,

[0027] g) optionally performing a multivariate data analysis of the datavariables obtaining model parameters describing the variation of thedata variables,

[0028] h) classifying the biological samples into at least two differentclasses correlated to the characterisation information, obtaining atrained classification system.

[0029] In another preferred embodiment the method comprises the stepsof:

[0030] a) obtaining a biological sample from an animal, including ahuman, wherein said biological sample is selected from body fluidsand/or tissue, wherein the tissue sample is not associated with saidcondition(s),

[0031] b) obtaining characterisation information related to eachbiological sample,

[0032] c) exposing the sample to excitation light within a predeterminedrange of wavelength,

[0033] d) determining physical parameter(s) of light emitted from thesample,

[0034] e) repeating step a) to c) until the physical parameters of alltraining samples have been determined,

[0035] f) performing a data handling of the obtained physical parametersobtaining data variables,

[0036] g) performing a multivariate data analysis of the data variablesobtaining model parameters describing the variation of the datavariables,

[0037] h) classifying the biological samples into at least two differentclasses correlated to the characterisation information, obtaining atrained classification system.

[0038] In another aspect the present invention relates to aclassification system for characterising a biological sample, saidsystem comprising:

[0039] a) a sample domain for comprising a biological sample,

[0040] b) light means for exposing the sample to excitation light in thesample domain,

[0041] c) a detecting means recording the physical parameter(s) of lightemitted from the sample,

[0042] d) optionally computing means for performing data handling of thephysical parameters, obtaining data variables,

[0043] e) optionally processing means for providing model parametersfrom data variables of the sample,

[0044] f) at least one storage means for storing physical parametersand/or data variables and/or model parameters of the biological sample,

[0045] g) at least one storage means for storing physical parametersand/or data variables and/or model parameters and characterisationinformation of a trained classification system,

[0046] h) means for correlating physical parameters and/or datavariables and/or model parameters from the sample with physicalparameters and/or data variables and/or model parameters of the trainedsystem, and

[0047] i) means for displaying the characterisation class(es) of asample.

[0048] In a preferred embodiment the system comprises:

[0049] a) a sample domain for comprising a biological sample,

[0050] b) light means for exposing the sample to excitation light in thesample domain,

[0051] c) a detecting means recording the physical parameter(s) of lightemitted from the sample,

[0052] d) computing means for performing data handling of the physicalparameters, obtaining data variables,

[0053] e) optionally processing means for providing model parametersfrom data variables of the sample,

[0054] f) at least one storage means for storing physical parametersand/or data variables and/or model parameters of the biological sample,

[0055] g) at least one storage means for storing. physical parametersand/or data variables and/or model parameters and characterisationinformation of a trained classification system,

[0056] h) means for correlating physical parameters and/or datavariables and/or model parameters from the sample with physicalparameters and/or data variables and/or model parameters of the trainedsystem, and

[0057] i) means for displaying the characterisation class(es) of asample.

[0058] In another preferred embodiment the system comprises:

[0059] a) a sample domain for comprising a biological sample,

[0060] b) light means for exposing the sample to excitation light in thesample domain,

[0061] c) a detecting means recording the physical parameter(s) of lightemitted from the sample,

[0062] d) computing means for performing data handling of the physicalparameters, obtaining data variables,

[0063] e) processing means for providing model parameters from datavariables of the sample,

[0064] f) at least one storage means for storing physical parametersand/or data variables and/or model parameters of the biological sample,

[0065] g) at least one storage means for storing physical parametersand/or data variables and/or model parameters and characterisationinformation of a trained classification system,

[0066] h) means for correlating physical parameters and/or datavariables and/or model parameters from the sample with physicalparameters and/or data variables and/or model parameters of the trainedsystem, and

[0067] i) means for displaying the characterisation class(es) of asample.

[0068] In yet another aspect the invention relates to a method forcharacterising a biological sample of an animal, including a human,comprising

[0069] a) obtaining a biological sample from the animal or human,

[0070] b) exposing the sample to excitation light,

[0071] c) determining the physical parameter(s) of light emitted fromthe sample,

[0072] d) optionally performing a data handling of the obtained physicalparameters obtaining data variables,

[0073] e) storing the physical parameters and/or data variables and/ormodel parameters,

[0074] f) optionally providing model parameters from data variables ofthe sample,

[0075] g) obtaining physical parameters and/or data variables and/ormodel parameters from a trained classification system,

[0076] h) correlating physical parameters and/or data variables and/ormodel parameters from the sample with physical parameters and/or datavariables and/or model parameters of the trained system, and

[0077] i) displaying characterisation class(es) of the sample.

[0078] In yet another aspect the invention relates to a method forcharacterising a biological sample of an animal, including a human,comprising

[0079] a) obtaining a biological sample from the animal or human,

[0080] b) exposing the sample to excitation light,

[0081] c) determining the physical parameter(s) of light emitted fromthe sample,

[0082] d) performing a data handling of the obtained physical parametersobtaining data variables,

[0083] e) storing the physical parameters and/or data variables and/ormodel parameters,

[0084] f) optionally providing model parameters from data variables ofthe sample,

[0085] g) obtaining physical parameters and/or data variables and/ormodel parameters from a trained classification system,

[0086] h) correlating physical parameters and/or data variables and/ormodel parameters from the sample with physical parameters and/or datavariables and/or model parameters of the trained system, and

[0087] i) displaying characterisation class(es) of the sample.

[0088] In yet another aspect the invention relates to a method forcharacterising a biological sample of an animal, including a human,comprising

[0089] a) obtaining a biological sample from the animal or human,

[0090] b) exposing the sample to excitation light,

[0091] c) determining the physical parameter(s) of light emitted fromthe sample,

[0092] d) performing a data handling of the obtained physical parametersobtaining data variables,

[0093] e) storing the physical parameters and/or data variables and/ormodel parameters,

[0094] f) providing model parameters from data variables of the sample,

[0095] g) obtaining physical parameters and/or data variables and/ormodel parameters from a trained classification system,

[0096] h) correlating physical parameters and/or data variables and/ormodel parameters from the sample with physical parameters and/or datavariables and/or model parameters of the trained system, and

[0097] i) displaying characterisation class(es) of the sample.

[0098] Thus, the comparison of the sample and the classificationinformation in the trained classification system can be carried out ondifferent levels of data, namely by comparing either the physicalparameters and/or the data variables and/or the model parameters. It islikewise conceivable that two of the levels of data or all three levelscan be used in the comparison of the biological sample to theclassification information in the trained classification system.

[0099] According to the first aspect of the invention, namely the methodof training a classification system, step b) which relates to obtainingclassification information related to each biological sample can becarried out at any point in time as long as the information is availablefor the last step (step h) of the training method.

[0100] According to a preferred embodiment of the three aspects of theinvention, the model parameters are latent variables being weightedaverages of the data variables.

[0101] The method is preferably carried out in a classification systemtrained according to the present invention.

DRAWINGS

[0102]FIG. 1. Fluorescence landscape of typical urine sample. Intensityis given as a function of excitation and emission wavelength.

[0103]FIG. 2. Three-dimensional score plot of latent variable(component) one versus two versus three. The 18 samples are labelledaccording to smoker/non-smoker (S/N) and person (number).

[0104]FIG. 3. Typical fluorescence excitation-emission landscape from asample from a fasting person

[0105]FIG. 4. A scatter plot of score one, two and five from a PCA modelof 23 fasting. and non-fasting persons.

[0106]FIG. 5. Score scatter plots from a PCA model of data from personswith benign tumors. The plots show score 1 versus 2, 1 versus 3 and 2versus 3.

[0107]FIG. 6. A plot of the raw fluorescence data used in the analysis.The 28 spectra from each sample are arranged successively on anarbitrary wavelength scale.

[0108]FIG. 7. Influence-plot of samples in two latent dimensions (bold)and in three latent dimensions (ordinary font). The behaviour of samplefive going from poor description (high residual) to high impact on themodel (high leverage) is indicative of an outlying behaviour. Thissample is not visible as an outlier in the plot of the raw data (FIG.6).

[0109]FIG. 8. Score-plot showing the samples from the cardiac patientsinvestigation in terms of the first principal component versus thesecond principal component.

[0110]FIG. 9. Unfolded variable averaged fluorescence spectra for the 8samples. The first emission top corresponds to excitation at 230 nm andthe last emission top corresponds to excitation at 500 nm.

[0111]FIG. 10A. PC1 vs. PC2 score plot from a PCA on auto scaled data.FIG. 10B. PC1 score from a PCA on auto scaled data.

[0112]FIG. 11A PC1 vs. PC2 score plot from a PCA on mean centered data.FIG. 11B PC1 score from a PCA on mean centered data.

[0113]FIG. 12A Predicted vs. measured for log(concentration) with allbacteria samples (i.e. without control sample). FIG. 12B Predicted vs.measured for log(concentration) without the control sample and thesample containing 10⁸ cells.

[0114]FIG. 13. Upper part: Front face fluorescence spectrum of anundiluted blood plasma sample. Lower part: Front face fluorescencespectrum of the same sample diluted 1:5000. Notice the differentintensity scales.

[0115]FIG. 14A. Excitation 230 nm as a function of the dilution. 2 isdiluted 1:5000, 3 is 1:3000, 4 is 1:2000, 5 is 1:700, 6 is 1:500, 7 is1:200, 8 is 1:100, 9 is 1:50, 10 is 1:25, 11 is 1:10, 12 is 1:5, 13 is1:2, 14 is undiluted sample. FIG. 14B Excitation 250 nm as a function ofthe dilution. Same dilutions as in FIG. 14A. (Measured in front facemode).

[0116]FIG. 15A Excitation 310 nm as a function of the dilution. Samedilutions as in FIG. 14A. Excitation 360 nm as a function of thedilution. Same dilutions as in FIG. 14A. (Measured in front face mode).

[0117]FIG. 16A. PC1 vs. PC2 score plot from a PCA. FIG. 16B. PC3 vs. PC4score plot from a PCA

[0118]FIG. 17. Upper part: Transmission fluorescence spectrum of anundiluted blood plasma sample. Lower part: Transmission fluorescencespectrum of the same sample diluted 1:5000. Notice the differentintensity scales.

[0119]FIG. 18A. Excitation 230 nm as a function of the dilution. 2 isdiluted 1:5000, 3 is 1:3000, 4 is 1:2000, 5 is 1:1000, 6 is 1:700, 7 is1:500, 8 is 1:200, 9 is 1:100, 10 is 1:50, 11 is 1:25, 12 is 1:10, 13 is1:5, 14 is 1:2, 15 is undiluted sample. FIG. 18B. Excitation 250 nm as afunction of the dilution. Same dilutions as in FIG. 18A. (Measured intransmission mode).

[0120]FIG. 19A. Excitation 310 nm as a function of the dilution. Samedilutions as in FIG. 18A. FIG. 19B. Excitation 360 nm as a function ofthe dilution. Same dilutions as in FIG. 18A. (Measured in transmissionmode).

[0121]FIG. 20A. PC1 vs. PC2 score plot from a PCA. FIG. 20B. PC3 vs. PC4score plot from a PCA.

[0122]FIG. 21A. Predicted vs. measured for front face for 1:25 to1:5000. FIG. 21B Predicted vs. measured for transmission for 1:25 to1:5000.

[0123]FIG. 22A. PC1 vs. PC2 score plot from a PCA on transmissionsamples. A, B, C, and D are different buffers with pH values of approx.8.5-9.0, 7.0-7.5, 5.0-6.5, and 0.1 M HCl, respectively. Numbers like 1:2indicate the dilution factor. FIG. 22B PC1 vs. PC2 score plot from a PCAon front face samples.

[0124]FIG. 23 PC1 vs. PC2 from a PCA on all samples. F is front face andT is transmission

[0125]FIG. 24. Unfolded fluorescence spectra at four different pH valuesfor transmission mode samples diluted 1:10. The difference observed for0.1 M HCl spectrum seen in insert B is not observed for the otherdilutions.

DETAILED DESCRIPTION OF THE INVENTION

[0126] The invention relates to classification based on physicalparameters obtained from the luminescence spectroscopy oh light emittedfrom the sample. For practical reasons most of the discussion in thisdescription relates to fluorescence spectroscopy. However as describedbelow other physical parameters may be used in the classification. Thus,throughout the description the term fluorescence is used as anequivalent of any luminescence type and is to be interpreted as such,unless disappropriate in specific embodiments.

[0127] Fluorescence spectroscopy is an extremely sensitive tool. Thedata obtained from a spectrofluorimetric analysis can be considered afinger-print of the sample. Each sample gives rise to a uniquespectrofluorometric set of physical parameters, however, as is describedby the present invention. When analysing the fluorescence data, it hasbecome possible to classify samples into two or more classes based onthe fluorescence spectra, if there is any systematic difference betweenthe samples. The difference between the samples will mostly not relateto a single component or a few components of the sample, but rather to acombination of a wide variety of components. This combination exhibits apattern so complex that it is detectable by multivariate analysis only.

[0128] Thus, according to the evaluation of the fluorescence parametersit is possible to obtain more information about a biological sample,than it is when evaluating the various chemical components in the sampleindividually, i.e. it is possible to obtain inter-component information.Furthermore, there is no need to know the exact composition ofcomponents in the sample, as it is the fluorescence finger-print ratherthan the components of the sample that is detected. If so desired, in aspecific application, it may be possible to give a chemicalcharacterisation of the information used by the classification system.It may even in certain situations be possible to do so directly from themathematical parameters derived from the physical parameters.

[0129] For one sample normally more than several thousand data variablesare obtained, and the amount of data increases by the number of samplesused but the number of data variables is constant for each sample. Inprior art it was common practise to discard most of thespectrofluorimetric information and use but a few selective orsemi-selective physical parameters but the present invention makes useof all available information.

[0130] By the present invention is has become possible to obtaininformation regarding an animal or a human being by subjecting abiological sample from said animal (human being) to a fluorescenceanalysis. Examples of the information provided by the present inventionmay be any information regarding health condition, such as informationregarding presence/absence of a specific disease, group of diseases orrisk of later attaining a specific disease or a body condition, orconcentration of a specific compound or medicine.

[0131] In a first aspect the invention relates to a method of training aclassification system for characterising a biological sample. It is thepurpose of the training that a classification system is obtained, saidsystem holding enough information to be used for characterising anun-classified and unknown biological sample into one of the classes ofthe classification system. By the term unknown is meant a sample forwhich no characterisation information is known.

[0132] It is also the purpose of the training of the system that thistraining incorporates a validation that substantiates how wellclassification can be performed on specific samples in the future aswell as improving the validation specificity and sensitivity over time.

[0133] Samples

[0134] The biological sample may be any sample suitable for fluorescenceanalysis. The sample may be fluid or solid, as is appropriate. It is anobject of the present invention to acquire the necessary informationfrom the sample using as few pre-treatments as possible, preferablywithout any pre-treatments as such.

[0135] Accordingly, in a most preferred embodiment the sample istransferred directly from the animal or human being to be subjected tofluorescence analysis, in order to obtain data relating to fresh,un-treated samples. In case it is not possible to use the biologicalsample directly, it may be stored, for example by freezing the sample.

[0136] A characteristic of the biological sample is that it ispreferably not directly related to the specific conditions, in that thespectroscopy is preferably not conducted on the tissue suspected toexpress the disease, whereby it is often possible to diagnose acondition or a disease in a easy manner, since the biological sample tobe examined may be easily established, such as a urine sample or aplasma sample.

[0137] Fluid samples may be any fluid samples obtainable from animals orhuman beings, i.e. body fluids, such as biological samples selected fromblood, plasma; serum, saliva, urine, cerebrospinal fluid, tears, nasalsecrete, semen, bile, lymph, milk, sweat and/or faeces.

[0138] In a preferred embodiment the fluids are easily available fluids,such as urine samples, milk, blood and/or serum and/or plasma samples.Most preferred are urine, milk or saliva samples or any other samplesthat are obtainable without any invasive technique.

[0139] The fluid sample is subjected to fluorescence analysis withoutdrying, and preferably without any other changes in concentration, suchas separation and enrichment. The fluid sample may be arranged in asample compartment being closed or open before exposing the sample toexcitation light.

[0140] It is however also possible with the present invention to usetissue samples, such as solid tissue samples directly. Examples oftissue samples include hair and nails. The tissue sample may be anysample, such as a biopsy of tissue, that is subjected to fluorescencespectroscopy. In the present invention the tissue is not directlyrelated to the specific condition(s), thus the term “the tissue sampleis not associated with said condition(s)” means that for example whenclassifying with respect to cancer the tissue sample does not representthe possibly cancerous tissue, but tissue from another part of theindividual.

[0141] The biopsy may be from any tissue, such as from muscle, cutis,subcutis, kidney, brain, and liver.

[0142] The solid samples may be classified on the solid form, but it mayoften be necessary to provide a liquid form of the tissue beforesubjecting the sample to fluorescence spectroscopy. The liquid form ofthe tissue may be obtained by dissolving the tissue or mechanicallydestroying the tissue, such as blending the tissue, to obtain a suitableliquid suspension of the tissue.

[0143] Furthermore, it is possible to use a sample positioned in situ ornon-invasive, i.e. not removed from its normal environment. The sampledoes not need to be physically removed from its place in the animal orhuman being. The invention also encompasses the possibility of in situanalysis of samples. This can be done easily with samples like skin,hair, and nails, but it is likewise possible to conduct the excitationand fluorescence light beams by means of light guides to and from theliquid or tissue samples within the body. This may be accomplished byconducting the measurements transdermally. The light guides may thus beintroduced into the body via body openings, such as the mouth, nose,ears, rectum, vagina, or urethra, or the light guides may be introducedthrough the blood vessels or inserted directly into tissue. In this wayvarious fluids may be measured in situ, as well as some of the solidtissue samples. In a preferred embodiment the biological sample isselected from body fluids, hair and nails, more preferred from bodyfluids.

[0144] Excitation Light

[0145] The physical parameters may in principle be obtained for a widevariety of excitation light wavelengths. The wavelengths are preferablyselected to be within the range of from 100 nm to 1000 nm, such as from100 to 800 nm, more preferably within the range of from 200 nm to 800nm, such as from 200 nm to 600 nm.

[0146] Normally several wavelengths are used, such as from 2 to 10.000,4 to 10.000, 2 to 1000, 4 to 1000, 2 to 100 wavelengths, such as from 4to 100 wavelengths, for instance 2-30, such as from 4 to 30 wavelengths,such as 2-10, such as from 4 to 10 wavelengths, for instance 2-6wavelengths in order to describe an excitation-emission matrixoptimally. Sets of wavelength may be chosen so that each wavelengthdiffers from the other by at least 01 nm, such as at least 0.5 nm, forinstance at least at least 1 nm, such as at least 5 nm, for instance atleast 10 nm, such as at least 50 nm, for instance at least at least 100nm, such as at least 150 nm, for instance at least 250 nm, such as atleast 500 nm, for instance at least 600 nm, such as at least 700 nm, andat most 750 nm.

[0147] Multiwavelength excitation may be established either sequentiallyby varying the setting of a monochromator or other dispersing orfiltering device in front of a continuous lightsource like a xenon lamp.Alternatively, the sample may be exposed to the full spectrum of acontinuous light source equipped with a polychromator which dispersesthe light spatially. Thus, different zones of the sample are exposed toexciting light of different wavelengths. Furthermore, an array of singlewavelength light sources light e.g. lasers or light guide bundles may beused either in the sequential mode or in the spatially separated mode.

[0148] Accordingly, at least 2 excitation light wavelengths are selectedsuch as at least 4, at least 6, at least 8, at least 10, or more. Theexcitation light of each wavelength may be used simultaneously orsequentially. In a preferred embodiment 4 wavelengths are selected, suchas excitation light having a wavelength of 230 nm, 240 nm, 290 nm, and340 nm. Each sample is then subjected to excitation light of eachwavelength. In another preferred embodiment 6 wavelengths are selected.

[0149] The predetermined excitation light wavelength(s) is provided byuse of light sources as is known to a person skilled in fluorescencespectroscopy.

[0150] Emission

[0151] The determination of the various physical parameters is done byequipment known to the person skilled in the art.

[0152] In fluorescence spectroscopy emission light intensities atdifferent wavelengths are recorded for each excitation light wavelength.Preferably the emission light is sampled with 0.5 nm intervals or 1 nmintervals. Thereby a matrix of excitation-emission data is obtainablefor each sample. Normally the spectral distribution of light emittedfrom the sample is ranging from 200 nm to 800 nm.

[0153] The emitted light is detected by any suitable detector, such as aone-dimensional detector, for example a photomultiplier. Alternatively,a scanning camera, a diode array, a CCD or a CMOS, all in principlebeing viewed as a two-dimensional array of several thousand or moredetectors. The intensity of the light is detected on each detector thuspermitting the whole spectrum or the whole EEM to be obtained in asingle electronic measurement.

[0154] The emitted light from the samples may be focused onto thedetectors by means of conventional focusing systems, as well as passingthrough diaphragms and mirrors.

[0155] Physical Parameters

[0156] Most frequently the physical parameter to be determined in orderto perform a data analysis is the intensity as a function of excitationwavelength and/or the emission wavelength. However, any otherinformation contained in photoluminescence may be obtained from thesample such as fluorescence lifetime, phosphorescence intensity,phosphorescence lifetime, polarisation, polarisation lifetime,anisotropy, anisotropy lifetime, phase-resolved emission, circularlypolarised fluorescence, fluorescence-detected circular dichroism, andany time dependence of the two last mentioned parameters.

[0157] Fluorescence intensity is easily measured at room temperature,and may therefore be chosen for many of the samples. Furthermore, agreat number of organic natural products are known to be fluorescent.Phosphorescence may, however, also be performed at room temperature.

[0158] Luminescence lifetime in general, as well as phosphorescencelifetime are defined as the time required for the emission intensity todrop to lie of its initial value.

[0159] When using phase resolved fluorescence spectroscopy it ispossible to suppress Raman and scattered light, leading to very goodresults for multicomponent systems.

[0160] In luminescence polarisation measurements, conventional spectraare obtained by scanning excitation spectra and measuring intensityparallel and perpendicular to the polarisation of exciting light. Thepolarisation may be calculated as the ratio of the difference of the twomeasurements to the sum of the two measurements. The anisotropyparameter is obtained by multiplying the perpendicular intensity by twoin the denominator sum of this ratio.

[0161] Processing

[0162] The detectors are preferably coupled to a computer for furtherprocessing of the data. The physical parameters measured or determinedby the detector are processed to a form suitable for the furthermathematical calculations. This is done by allocating data variables toeach physical parameter determined, thus obtaining data variablesrelated to the physical parameters,

[0163] The physical parameters determined are often subjected to a dataanalysis through the data variables, such as a oneway matrix of spectralinformation, a two-way matrix of spectral information, a three-waymatrix of spectral information, a four-way matrix of spectralinformation or, a five-way or higher-order matrices of spectralinformation.

[0164] Characterisation Information

[0165] To obtain information relating to a specific condition in theanimal or human being it is of importance that the data relating to thespectra obtained are correlated to characterisation informationregarding the same biological sample. The information regarding thebiological sample is preferably obtained substantially simultaneouslywith the biological sample, however, for characterisation data notvarying essentially it is sufficient to obtain the data after havingobtained the physical parameters.

[0166] The characterisation information relates to the classes intendedto be generated through the training period.

[0167] The characterisation information is for instance relating to thepresence or absence of a physical condition, such as a specific disease,or information regarding smoking, drinking, abuse of drugs, nutritivecondition, etc. Furthermore, the characterisation information mayinclude information such as sex, race, age or the like that is relevantfor the classification. The characterisation information may also beinformation regarding responsiveness to a treatment as well asinformation regarding side effects of a treatment. The characterisationinformation may give information of both qualitative and quantitativeinformation.

[0168] Also predictive information regarding an individual's risk ofacquiring a condition or disease may be obtained by the presentinvention. The training of the system may be conducted by subjecting akohorte of individuals to successive sample analysis and classify thesamples into groups of individuals acquiring the disease and groupsstaying healthy during the period of sampling.

[0169] The characterisation information must be correlated to thespectral information obtained from the sample, in order to obtain thetrained system ready for testing unknown samples.

[0170] Validity

[0171] Each sample is subjected to fluorescence spectroscopy before thedata analysis is performed in the training of the classification system.The sample may be one sample from each animal or human being, or severalsamples from the same individual, each sample obtained at a differenttime interval or from different fluids or from different instruments.

[0172] Depending on the classes to be identified when training theclassification system it is of importance to train the system with asufficient number of samples. The determination of the sufficient numberof samples is primarily determined by the similarity of the fingerprintsof different classes. Indirectly, this can often be related to thenumber of expected latent variables, wherein the latent variables areweighted averages of the data variables. It is preferred that the ratioof number of training samples to the expected number of latent variablesis at least 5:1, preferably at least 10:1. More preferred the ratio is50:1, and even more preferred 100:1. The more training samples, the morereliable a system. Training is a continual improvement of the system andany sample is also a training sample being weighted decreasingly,however, over time.

[0173] The samples being classified in each class are preferably arepresentative group of samples to allow the most reliableclassification, wherein representative is meant to mean exhibiting allvariations influencing said classification. These variables can forexample be age, sex, medication, existing disease, and race to match thepopulation for which the classification system is designed.

[0174] Mathematics

[0175] A central aspect of the, invention is the performance of amultivariate analysis, whereby the data variables relating to thephysical parameters are evaluated and model parameters are obtained. Themodel parameters describe the variation of the data variables. Therebythe samples are classified uniquely into classes. The identification ofthe classes is obtained when each sample is correlated to thecharacterisation information relating to said sample. Correlation inthis respect is not necessarily a mathematical correlation. Correlationin this respect may also comprise the possibility of performing acomparison of data or fluorescence spectra.

[0176] Preferably, the model parameters are latent variables beingweighted averages of the data variables.

[0177] The identification of the belonging to a class is obtained whenthe data variables of a sample are input to a trained classificationsystem yielding either qualitative and/or quantitative information as towhether a sample belongs to a class.

[0178] In performing the data analysis it is often an advantage that thecharacterisation information is already available. Thereby it becomespossible to detect exactly those structures in the data that arerelevant for detecting the difference between the classes and not juststructures that may not be relevant for the classification.

[0179] The multivariate statistical methods suitable for the presentinvention are for example represented by chemometric methods likeprincipal component analysis (PCA), partial least squares regression(PLS), soft independent modelling of class analogy (SIMCA) and principalvariables (PV).

[0180] A non-exclusive list of other multivariate statistical methodsinclude: Principal component analysis¹⁴, principal componentregression¹⁴, factor analysis², partial least squares¹⁴, fuzzyclustering¹⁶, artificial neural networks⁶, parallel factor analysis⁴,Tucker models¹³, generalized rank annihilation method⁹, locally weightedregression¹⁵, ridge regression³, total least squares¹⁰, principalcovariates regression⁷, Kohonen networks¹², linear or quadraticdiscriminant analysis¹¹, k-nearest neighbors based on rank-reduceddistances¹, multilinear regression methods⁶, soft independent modelingof class analogies⁸, robustified versions of the above and/or obviousnon-linear versions such as one obtained by allowing for interactions orcrossproducts of variables, exponential transformations etc.

[0181] The term “describing the variation of the data variables” meansthat the latent variables retain the relevant information regarding thevariation, whereas “noise” is preferably not giving any significant partin the latent variables.

[0182] As an example of a multivariate data analysis technique, the useof principal component analysis—PCA—will be outlined [Jackson 1991] asthis technique will be used in the following exemplary applications. AnI×J data matrix, Xε

^(×J), is given where I is the number of rows (samples) and J is thenumber of columns. The number of variables will typically exceed thenumber of samples by far. This poses the practical problem that thematrix is typically ill-conditioned. Thus, any traditional analysisusing the whole set of raw data will lead to useless results due thenumerical problems involved in handling the large amount of data

[0183] Using PCA, the original J variables are replaced by F (<<J)latent variables which, in this case, are also called principalcomponents. These latent variables are found as weighted averages of theoriginal variables in such a way that they provide the best possibledescription of the data in a least squares sense. Each latent variableconsists of a score vector t (I×1) and a loading vector p (J×1). Theloading vector is constrained to norm one and the score vector is foundby regressing X onto p

t=Xp/p ^(T) p=Xp

[0184] For the first latent variable it holds that it minimizes

∥X−tp^(T)∥_(F) ²

[0185] where ∥•∥_(F) ² denotes the squared Frobenius norm. Thus, thefirst latent variable provides the least squares best-fitting rank-onemodel of X. The second latent variable is found under the constraintthat the second score vector t₂ is orthogonal to the first score t₁ andthat second loading vector p₂ is orthogonal to the first loading p₁.Under this restriction, the second latent variable is found such that itprovides the best possible fit to the data. Extracting F such componentswill yield a rank F model of the data. Let the score matrix T (I×F) holdthe score vectors t_(f), f=1, . . . , F and the loading matrix P (J×F)holds the loading vectors p_(f), f=1, . . . , F of this solution. Itthen holds that T and P provide the solution to$\underset{G,R}{\arg \quad \min}{{X - {GR}^{T}}}_{F}^{2}$

[0186] and thus provide the best-fitting rank F solution. In practice,the solution to this problem can be found using a truncated singularvalue decomposition of X. If U_(F) holds the first F left singularvectors, V_(F) holds the first F right singular vectors and S_(F) is anF×F diagonal-matrix holding the first F singular values in its diagonal,then it holds that

T=U _(F) S _(F) ; P=V _(F).

[0187] In order to choose the appropriate number of components, F,several strategies are possible. One approach is to use cross-validation[Wold 1978] in which elements are left out of the data in turn. For eachset of elements left out, a model is fitted to the remaining data andthe model

{circumflex over (X)}=TP ^(T)

[0188] is used to estimate the left out elements. After all elementshave been left out once, the thus obtained residuals are used forcalculating the predicted residual sum of squares (PRESS) and the numberF for which PRESS is at its minimum is usually taken to be theappropriate number of components. For exploratory purposes, it isusually sufficient to simply retain the first 2-5 components becausethese, per definition, retain most of the variation in X. It is notedthat if cross-validation is to be performed in this way, specialalgorithms have to be used because of the missing values in the data[Grung & Manne 1998].

[0189] The practical usefulness of PCA arises because of the informationpreserving compression of the data based on the empirical observationsrather than on theoretical derivations. The scores T can be seen as thecoordinates of X in the reduced space defined by the truncated basis Pand the latent variables therefore provides a condensation of theoriginal J variables into F new ones. This condensed representation isfeasible because it allows a holistic visualization of the structure inthe data and because it makes it possible to do quantitative analysissuch as regression and classification in a straightforward way.

[0190] In some situations, the interest is to specifically make aquantitative model relating multivariate data to one or more responsesby a regression model. This way, it is possible to measure futuresamples by the multivariate approach and then predict the response fromthe regression model. Such a regression problem suffers from the sameproblems as outlined above and for the same reason rank-reducedregression is often employed. As an example of such, the partial leastsquares regression—PLS—method will be described.

[0191] As before a multivariate set of data X (I×J) is available andfurther a response vector y (I×1) is given. More responses can behandled as well, but this is not pursued here. The aim is to find aregression vector b that provides a feasible solution to the regressionproblem

y=Xb+e

[0192] where e (I×1) is a vector of unmodelled residuals. Using multiplelinear regression (J≦I) or similar approaches, it is possible to obtaina minimum variance unbiased estimate of b but due to the constraint ofbeing unbiased, the variance will, in practice, make the estimateuseless for predictions in the situations considered here [de Jong 1995,Martens & Naes 1987, Wold et al. 1984]. Instead, a regression vector issought that yields a low mean squared error, hence relaxing therestriction of being unbiased focusing on low total error. In PLS, thisis achieved by extracting components sequentially such that eachextracted score vector has maximal covariance with the yet unexplainedvariation in the response [Bro 1996, Martens & Naes 1989]. Usually X andy are centered by subtracting the column average from each column,thereby removing possible offsets. For centered X and y, the firstcomponent is determined by defining a weight vector as$w = {\frac{X^{T}y}{{{X^{T}y}}_{F}}.}$

[0193] From this vector, the score vector t is defined as

t=Xw

[0194] and finally a loading vector p is defined as

p=X ^(T) t/t ^(T) t.

[0195] The rank-one model of X is then given by tp^(T) and theregression model relating the bilinear model of X to y is defined by thescalar

r=y ^(T) t/t ^(T) t.

[0196] giving the initial prediction of y as tr. The model of X (tp^(T))and the prediction of y (tr) are subtracted from X and y and thefollowing component is determined similar to above but using theresiduals of X and y as input. After calculating F components in thismanner, the following matrices and vectors are given: T (I×F), P (J×F),W (J×F), r (F×1). The regression vector can then be determined as

b=W(P^(T) W)⁻¹ r.

[0197] As for PCA, cross-validation is usually employed to determine theoptimal rank of the model with the only difference being that wholesamples are excluded in each cross-validation segment, and the residualerror determined is the response error.

[0198] Other Variables

[0199] The classification system may be obtained on the spectralinformation only. However, in some situations it may be appropriate toincorporate other variable(s) in the multivariate analysis.

[0200] These other variables may be variables relating to the samplesupplying the spectral information or they may be variables thatcompensate for a specific condition of the sample.

[0201] Examples hereof may be the measurement of pH, electrolytes,temperature in the sample before subjecting it to spectroscopy, clinicalparameters. Thereby variations in the other variables may be compensatedfor in the final classification.

[0202] In another embodiment other variables are variables relating tothe animal, including a human being, to be characterised. Non-exclusiveexamples of these variables are hair colour, skin colour, age, sex,geographic origin, affiliation, prior diseases, hereditary background,medication intake, body conditions (such as e.g. surgery), stress level,medical diagnoses, subjective evaluations, and other diagnostic tests(e.g. immunoassays, x-ray diagnosis, genomic information or an earlierchemometric test).

[0203] Pre-Treatment

[0204] It is an advantage of the present invention that no pre-treatmentof the sample is normally necessary.

[0205] However, for some of the samples or applications it may benecessary or convenient to perform an adjustment before subjecting thesample to spectroscopy.

[0206] Examples of pre-treatment may be adjustment of pH of the sampleto a predetermined value, or heating or cooling the sample to apredetermined temperature. The sample may be treated with chemicals(complexing agents etc.) in order to develop e.g. fluorescent complexesinvolving inherent non-fluorescent molecules in the sample.

[0207] Other types of pre-treatment include addition of chemicalsubstances, measurement under a gradient imposed by varying additions ofchemical substances, simple chromatographic pre-treatments based oneither chemical or physical separation principles.

[0208] Classification System

[0209] Another aspect of the present invention is the classificationsystem for characterising a biological sample into at least onepredetermined class.

[0210] When the classification system has been trained as discussedabove, it is ready for classifying samples with unknown characteristics.The classification system preferably comprises the following components:

[0211] a) a sample domain for comprising a biological sample,

[0212] b) light means for exposing the sample to excitation light in thesample domain,

[0213] c) a detecting means recording the physical parameter(s) of lightemitted from the sample,

[0214] d) optionally computing means for performing data handling of thephysical parameters, obtaining data variables,

[0215] e) optionally processing means for providing model parametersfrom data variables of the sample,

[0216] f) at least one storage means for storing physical parametersand/or data variables and/or model parameters of the biological sample,

[0217] g) at least one storage means for storing physical parametersand/or data variables and/or model parameters and characterisationinformation of a trained classification system,

[0218] h) means for correlating physical parameters and/or datavariables and/or model parameters from the sample with physicalparameters and/or data variables and/or model parameters of the trainedsystem, and

[0219] i) means for displaying the characterisation class(es) of asample.

[0220] The sample domain may be a sample chamber for accommodating acontainer with a liquid, a solid or a semi-solid sample. However, thesample domain may also be a domain in the individual to be classified inthat the analysis can be performed on a sample in situ such as in theblood vessels or on the superficial body parts such as skin, nails, orhair.

[0221] The classification system may be provided as a whole unit,wherein the spectroscopy of the sample is conducted by the same unitfrom where the data relating to the characterisation classes of thesample is displayed.

[0222] It is however contemplated within the scope of the presentinvention, that the system is comprised of at least two units, whereinone unit is performing the steps a) to f), arid another unit isperforming the steps g) to i), Other units comprising other parts of thesystem are also contemplated, such as one unit performing the steps a)to d) and storage means for storing physical parameters and/or datavariables from f) or a) to g), and the other unit comprising the rest ofthe parts. Yet another unit comprises steps a) to c) in one unit and theremaining steps in the other unit.

[0223] By the system thus divided into at least two units, it ispossible to obtain the spectroscopic information from a wide variety ofdecentral locations and perform the processing centrally. The data orthe classification system may then be transmitted by any suitable means,such as conventional data transmission lines, for example the telephonelines, or via internet or intranet connections.

[0224] This facilitates the use of the classification system since anyphysician may provide the biological sample, have it subjected tospectroscopic analysis at his or her clinic and have the data correlateddecentrally without the need of being capable of conducting thisprocessing. The physician may then call the central unit to request thecorrelation and classification of the sample data. Depending on thetransmission mode and equipment, the result may be displayed on a screenor printed on paper, or informed by telephone.

[0225] In addition to the result, other information may be provided,such as information regarding sample errors, for example the testrequires a urine sample not a serum sample, information about thestatistics, such as fuzziness, the degree of membership of a group,power and significance.

[0226] Diagnosis

[0227] In principle the classification system trained according to theinvention may be used to characterise any biological sample with respectto any kind of information.

[0228] Interesting parts of the present invention relate to thepossibilities of diagnosing a condition or the risk of acquiring acondition, such as a physical condition in an animal or a human being,from a spectrofluorimetric analysis of a biological sample from saidanimal or human being and relating the spectroscopic data with data inthe classification system.

[0229] As for any other diagnostic tool, the present invention providesa diagnostic tool, that may give a strong indication of a disease orcondition or a risk of such disease or condition, but for many of thediagnosis these may have to be confirmed by more specific diagnosticmethods, more precisely directed to the specific diagnostic area.However, due to the simplicity of the present invention, the precisediagnosis may be obtained faster and much more cost-effective than byhitherto known methods.

[0230] The disease detected may be any disease that provides combinationof components in the biological sample that is detectable as a patternby the fluorescence spectroscopy.

[0231] Thus the disease may be selected from any official diseaseclassification system, such as ICD-9/10 (WHO's official internationalclassification list), ICIDH-2 (International Classification ofFunctioning and Disability) but not limited to those two. Suchclassification system includes at least the following groups (thenumbers in brackets refer to the ICD 9 list) of human diseases as wellas similar diseases related to other animals:

[0232] Infectious and parasitic diseases (001-139)

[0233] Neoplasms (140-239)

[0234] Endocrine, nutritional and metabolic diseases, and immunitydisorders (240-279)

[0235] Diseases of the blood and blood-forming organs (280-289)

[0236] Mental disorders (290-319)

[0237] Diseases of the nervous system and sense organs (320-389)

[0238] Diseases of the circulatory system (390-459)

[0239] Diseases of the respiratory system (460-519)

[0240] Diseases of the digestive system (520-579)

[0241] Diseases of the genitourinary system (580-629)

[0242] Complications of pregnancy, childbirth, and the puerperium(630-677)

[0243] Diseases of the skin and subcutaneous tissue (680-709)

[0244] Diseases of the musculoskeletal system and connective tissue(710-739)

[0245] Congenital anomalies (740-759)

[0246] Certain conditions originating in the perinatal period (760-779)

[0247] Injury and poisoning (800-999)

[0248] The sample may be classified to belong to a class for any of thediseases above, quickly leading the examining physician to the mostlikely diagnosis. The sample classification may be confirmatory or itmay have to be confirmed by more specific diagnostic tools.

[0249] Furthermore, the sample may be classified into more than oneclass, whereby a more refined diagnostic tool is provided.

[0250] For many of the diseases mentioned above, it is of utmostimportance that an early diagnosis is obtained, but many of thesediseases may be difficult to diagnose conventionally at the early stagedue to very discrete symptoms. By the present method it is possible toget a dear indication of the disease at an early stage.

[0251] Furthermore, by the present invention it may furthermore bepossible to reveal individuals susceptible to a specific disease, due tothe classification of relevant biological samples from theseindividuals.

[0252] In particular in respect of cancer, the invention may be used toclassify different forms of cancer, including cancer in various organs.Thus, the invention may be used to diagnose renal cancer from coloncancer for example. Futhermore, different stages of a cancer may bediagnosed, including precancerous stages. Furthermore different canceraggressivity may be diagnosed for example the invention may identifyhigh-risk cancer patients independently of whether the underlying canceris anatomically localised in for example breast, lung or colon.

[0253] It is likewise conceivable that the present invention can be usedfor screening of individuals to identify those suffering from aparticular disease or those being susceptible to a disease or thoseexpected to suffer from the disease in the near future.

[0254] Also, the present invention may reveal individuals at risk due toenvironmental hazards, job environment or the like.

[0255] The present invention may also be used to diagnose a variety ofabuse of medicine and/or narcotics, for example in relation to control,or in un-conscious or semi-conscious individuals that have to be treatedfor their abuse.

[0256] Another aspect of the invention may be to detect physical and/orpsychological stress in an individual, and thereby detect persons atrisk of acquiring stress related diseases.

[0257] Yet another aspect of the invention may be the detection ofgenetic modifications or inherited risk by examining a biological sampleby fluorescence spectroscopy.

[0258] Yet another aspect of the invention may be to provide aquantitative answer to the degree of any of the above-mentionedsituations (e.g. the amount of medicine, the degree of risk etc.).

[0259] In a further embodiment the invention may be used as a tool forpredicting the responsiveness to a specific treatment for an individualsuffering from a particular disease. For example the invention may beused to classify individuals suffering from cancer into classes ofpredicted responsiveness to chemotherapy, radiations and/or operation.Another example may be to predict the useful medication of depressiveindividuals.

EXAMPLES Example 1

[0260] Example of the Use of Multidimensional Sensorial FluorescenceData Analysis

[0261] Smokers/Non-Smokers Data

[0262] This example illustrates the usefulness of training aclassification system on a small experimental data set. The data setcomprises 18 samples taken from 14 individuals.

[0263] Sampling and Measurements

[0264] Urine was collected from several male persons and measuredspectrofluorimetrically. Approximately half of the testees were smokers,the other half non-smokers. Some samples were measured the same day,others up to 4 days after sampling. The samples were kept at −4° C.until measurement was performed. The urine was not diluted beforemeasurement. Fluorescence spectra were recorded on a Perkin-Elmer LS-50Bspectrofluorimeter using front face illumination.

[0265] Scanning was performed from excitation wavelengths 230 nm to 500nm and from emission wavelengths 268 nm to 900 nm.

[0266] Data Handling

[0267] For each thus obtained fluorescence landscape, the obviousnon-bilinear parts (emission below excitation and zero- and first-orderRayleigh scatter) were removed and the corresponding elements denoted‘missing’. This lead to data of the type shown in FIG. 1.

[0268] For each sample, i, a matrix X_(i) is thus obtained of size J×Kwith J being the number of emission wavelengths and K being the numberof excitation wavelengths. The whole data set is arranged in a three-waytensorial structure with typical elements X_(ijk), i=1, . . . , I, j=1,. . . , J, k=1, . . . , K. This three-way structure may, geometrically,be interpreted as a box of data, where each horizontal slice correspondsto a specific sample, each vertical slice corresponds to a specificemission wavelength, and each frontal slice corresponds to a specificexcitation wavelength.

[0269] In the following, this three-way tensorial array is matricized,i.e. rearranged into a two-way matrix called Z where each rowcorresponds to a sample, and holds all combinations of excitation andemission. In this setup—interpreted as in ordinary multivariate dataanalysis—there are 4003 variables (plus a fraction that is removedbecause it contains either variables which are set to missing orvariables with extremely small variance).

[0270] Data Modelling

[0271] This data matrix is subjected (Matlab, version 5.2) to principalcomponent analysis, in which the 4003 data variables are replaced withthree latent variables. These latent variables are weighted averages ofthe original data variables, defined so that the projection of Z ontothe space spanned by these, retain as much variation as possible. Thatis, the latent variables T of size 18×3 are defined through a set ofweights, P (4003×3) as the solution to$\max\limits_{P}{{{{ZP}\left( {P^{T}P} \right)}^{- 1}P^{T}}}_{F}^{2}$

[0272] where ∥

∥_(F) denotes the Frobenius norm. Because the weight matrix is chosen tobe orthogonal (without loss of generality), this expression can bereduced to $\max\limits_{P}{{ZPP}^{T}}_{F}^{2}$

[0273] From this expression, the latent variables T are found as thecoordinates of Z in the reduced space defined by the truncated basis P

T=ZP.

[0274] Note, that the latent variables are found using no informationabout the status of the persons (smoker or non-smoker). The latentvariables provides a condensed picture of the original 4003 variables,and a simple graphical representation of these is sufficient toillustrate the power of this compression (FIG. 2).

[0275] As can be seen in the plot, all non-smokers fall above the dashedline and all but one smoker falls below the dashed line. Disregardingfor now the one outstanding sample (S3), it is easily seen that thissimple plot provides a powerful tool for assessing whether a person is asmoker or not. Simply by measuring the fluorescence excitation-emissiondata from a urine sample, and projecting the obtained data onto thefactor weights found above, a set of scorings on the three latentvariables is obtained for a new person. When these are plotted in theabove plot, the position of the sample above or below the dashed lineenables an assessment of whether the person is a smoker or not. In fact,the sample of the person identified as S2 in the lower right part, wasleft of the initial analysis and only positioned after the plot wasgenerated. As can be seen the position is correctly below the dashedline, indicating that the person is a smoker. This graphical assessmentcan be automated in a number of ways using appropriate patternrecognition techniques.

[0276] The one smoker-sample located above the dashed line indicates anerroneous condition. However, in this initial feasibility study, nodetailed information on the individual persons were available nor oftheir smoking habits. Hence, there can be numerous reasons for thisparticular position, such as the person had not been smoking thatparticular day and the day before etc.

Example 2

[0277] Detecting Fasting Condition

[0278] A patient undergoing a surgical procedure in general anesthesiais exposed to a relatively high risk if he or she has being eating ordrinking before becoming intubated. Under a planed procedure thepatients have been instructed not to indigest before the surgicalprocedure. However, the patients (especially children or elderlypersons) will not always refrain from indigesting. In both the planedand acute procedure it is of help for the physician to know whether apatient has been eating or not. Thus, such a test will be a feasible‘add-on’ to other tests with low marginal cost.

[0279] Samples and Solutions

[0280] This study includes 9 normal persons fasting and 14 normalpersons not fasting. The conditions for all 23 persons were identicalexcept for the question of whether the persons had been fasting or not.

[0281] For each person a blood sample was taken and blood plasmatherefrom frozen. The blood plasma samples were defrosted and measuredat room temperature. The samples were measured undiluted front face in a1 mm cuvette on a Perkin Elmer LS50B (Copenhagen University). Theexcitation wavelength interval range was 230-400 nm (10 nm steps) andthe excitation and emission slits were 4 nm and 3 nm, respectively. Thescan rate was 1000 nm/min. In all, a total of 3834 variables weremeasured for each sample. At each excitation wavelength, the emissionwas removed in the range from 250 to 22 nm above the excitationwavelength in order to remove Rayleigh scatter and other irrelevantphenomena [Bro 1998, Bro. 1999]. Upon removal of these variables, atotal of 2020 variables were retained. A typical landscape is shown inFIG. 3.

[0282] Results

[0283] The data were fitted by a PCA model. The model indicated that atleast up to six components contained valid information. For the presentpurpose, it means that the main systematic part of the fluorescencevariation is retained in these six new variables. A resulting score plotis shown below (FIG. 4). It is a three-dimensional scatter plot of scoreone, two and five. Each plot represents the relative position of oneperson with respect to that persons fluorescence fingerprint. It isimmediately seen in the plot that all fasting persons appear to theright in the plot and all non-fasting persons appear to the left.

[0284] The significance of this particular plot can be described asfollows. In estimating the six components in the PCA model, no usewhatsoever has been made of the fasting-information. The PCA model isonly based on the fluorescence data. The empirical observation that itis possible to assign areas of the plot to only fasting persons andareas to only non-fasting persons means that a discrimination betweenthe two groups has been achieved. Thus, for a person where it is unknownwhether the person is fasting, it is possible to measure a correspondingfluorescence landscape under similar conditions and thereby obtain thescores for that particular person. Inserting these scores in the plotabove, it is then possible to evaluate or verify whether the person isfasting or not by simply monitoring to which side of the indicated linethe point is positioned.

[0285] More elaborate decision rules can easily be envisioned using e.g.linear discriminant analysis [Indahl et al. 1999], SIMCA [Wold & Dunn,III 1983] or some similar classification approach. However, for thisfeasibility study it suffices to show that discrimination is possible toachieve with multivariate analysis of fluorescence landscapes.

Example 3

[0286] Analysis of Colon Cancer Data

[0287] Having a simple tool for detecting colon cancer is a veryinteresting application of the current invention.

[0288] Materials and Methods

[0289] The data gathered here, include 77 samples (9 normal persons; 13with benign tumor; 11 Dukes A; 14 Dukes B; 15 Dukes C; 15 Dukes D). Foreach person a blood sample was taken and blood plasma therefrom frozen.The blood plasma samples were defrosted and measured at roomtemperature. The samples were measured undiluted front face in a 1 mmcuvette on a Perkin Elmer LS50B (Copenhagen University). The excitationwavelength interval range was 230-400 nm (10 nm steps) and theexcitation and emission slits were 4 nm and 3 nm, respectively. The scanrate was 1000 nm/min. In all, a total of 3834 variables were measuredfor each sample. At each excitation wavelength, the emission was removedin the range from 250 to 22 nm above the excitation wavelength in orderto remove Rayleigh scatter and other irrelevant phenomena [Bro 1998, Bro1999]. Upon removal of these variables, a total of 2020 variables wereretained.

[0290] Results

[0291] For each class of samples, a principal component analysis (PCA)model is fitted to the fluorescence data. This is important forexploring the homogeneity of the group and for eliminating obviouserroneous samples.

[0292] As an example, a PCA model of the data of persons with benigntumors is discussed. In this group, there seems to be severalindividuals located distinctly isolated (4490, 4499, 8319, 4506) in thescore plots (FIG. 5). The remaining persons are situated in the samegroup in the score plot. The reasons for the behavior of the outlyingsamples can be related to the patients (extreme patients in some sense),to the sampling of the blood (extreme sampling in some sense) or to theactual measurements. For e.g. 4490 the person was later found to beincorrectly classified as benign. For 4506 the technician noted that thesuspension was cloudy indicating incorrect treatment of the sample. For8319 the sample was noted to have precipitated matter, whereas for 4499no reason was found for its behavior, besides the outlying behavior ofthe measured fluorescence data. In order to assure that the subsequentresults are as robust as possible, the four samples were excluded; thethree because of erroneous sample treatment or measurement and thefourth because of assumed but unknown erroneous sample treatment ormeasurement.

[0293] Similar outlying samples were found in other groups as well. E.g.for the group of Dukes A one sample had a very different fluorescencepattern and was excluded. For the Dukes B samples, four such sampleswere observed. For Dukes C only sample VB106 is moderately outlying. ForDukes D, sample VB76 is moderately outlying. All in all, 11 samples outof the 77 were excluded. An explanation of the erroneous sampletreatment or measurement was found for most of these samples, whichmakes the decision to exclude these valid and reasonable. It musthowever, be borne in mind, that for five of the samples, there was noexplanation found for the strange behavior. Hence, excluding these fromthe subsequent classification model is somewhat hazardous, becausesimilar correct samples might be anticipated in real applications.Nevertheless, for the present feasibility study, the samples areconsidered as outliers of which the cause is presently unknown.

[0294] In order to quantify how well these data can be used forscreening for cancer, the data were split up into two groups: Personswithout cancer (18) and persons with cancer (48). A cross-validation wasperformed in the following way. One person was left out in turn andsubsequently a PCA model was fitted to each group of data. Thus two PCAmodels were built. For the non-cancer group, five components were usedand for the cancer group seven components were used. The data from theleft-out person was subsequently fitted to the two independentlyobtained models yielding 1) a set of score values for the sample and 2)a set of residuals of fluorescence variation of that sample that themodel could not explain. For one model, the score values of the newsample, t (1×F) and the scores from the calibration data T (I×F) areused for calculating the T² statistic as T²=t(T^(T)T)⁻¹t and the Qstatistic as e^(T)e where e (J×1) is the vector of residual variation inthe fluorescence data not explained by the model. The ratio of thesevalues and the corresponding confidence limits obtained from the modelare calculated (hence a value above one indicates that the sample isdifferent). These two ratios are squared and summed and the square-rootis used to test for class belongingness. If this number is less than thesquare-root of two, the sample is assigned to the class. If the sampleis assigned to both classes, the one with the smallest number is the onechosen.

[0295] Using this approach the following classification result isobtained. TABLE 1 Classification results Normal Cancer Total 18 48Correctly classified 6 48 Incorrectly classified 12 0

[0296] It is observed that 82% are correctly classified and no falsenegatives are obtained.

Example 4

[0297] Fluorescence Measurements of Urine from Cardiac Patients

[0298] This example illustrates the treatment of outlying data and theability of the invention to classify patients according to cardiacproblems.

[0299] Samples

[0300] Eight urine samples were collected from seven men and one woman(post-menopause) who all were diagnosed with angina pectoris (samples#1-8). No other information was available from these patients. Forcomparison, urine samples were collected from five, arbitrarily chosenmen (samples #9-13).

[0301] Measurements

[0302] Excitation-emission matrices were measured on the undilutedsamples in a cuvette with 2 mm light path using front-face geometry on aPerkin-Elmer LS50B spectrofluorometer. In 28 consecutive scans theexcitation wavelength was shifted in 10 nm steps from 230 to 500 nm.Emission intensity was recorded starting 20 nm after the excitationwavelength until two times the excitation wavelength minus offset (or900 nm). Thus, neither first nor second order Rayleigh scatter wererecorded. Emission intensity was measured in intervals of 0.5 nm.Spectral bandwidth on both monochromators was 5 nm. Scan-rate was 500nm/min. In total, fluorescence intensity was measured at 17828 differentcombinations of excitation and emission wavelengths and the valuesexported to Matlab, version 5.2.1.

[0303] Results

[0304] Although the data have a three-way structure (samples×excitationwavelength×emission wavelength) this is disregarded and the data arerearranged to a two-way structure (samples×combination of excitation andemission wavelength) as illustrated in FIG. 6. In this the 28 spectra,being arranged successively on an increasing wavelength scale, aredisplayed in an overlay fashion. The thus obtained two-matrix iscentered by subtracting from each column its average value. By means ofPrincipal Component Analysis this centered matrix is modelled by threeprincipal components obtained from a singular value decomposition of thecentered matrix.

[0305] An initial analysis reveals that patient #5 is quite extreme ascompared to the remaining patients. This is illustrated in FIG. 7 in aso-called influence plot. In a two-component model patient five has avery large residual variation (upper left corner) whereas in athree-component model patient five has an extremely high leverage (lowerright corner). This result shows that after two components most data arewell described except for patient five. Consequently, in the thirdcomponent the fifth patient gets a high leverage, which means that thispatient is determining this component. This is a typical example of anextreme outlier. If the cause for the extreme behaviour is instrumental,the patient's data must be excluded as an incorrect measurement. If thecause is biological diversity, this diversity must be better representedin the data by incorporating more similar samples. As there are nofurther data in this specific investigation, the only suitable procedureis to exclude this sample.

[0306] Indirectly, the appearance of the outlying sample is an importantillustration of one of the very important benefits of using exploratorydata-analysis and having many physical parameters at disposal. Had itnot been possible to detect the outlying sample, conclusions from theanalysis could have been misleading. The model would be reflecting thedifference between the samples as such and the extreme sample five,rather than explaining the inter-differences and patterns between allsamples. The availability and. use of this evaluating tool during themodel-building step shows that quality-deteriorating samples can beexcluded, thus leading to improved models with improved validity.

[0307] Refitting the principal component model without sample five, ascore-plot is obtained as shown in FIG. 8. Only the first two scorevectors, PC1 and PC2, are displayed. The most important latentvariables, PC1 and PC2 represent 97% of the original variation in thefluorescence data obtained in this investigation. The samples separateinto two distinct clusters: Those below the dashed line in the lowerleft corner all represent samples from persons diagnosed with anginapectoris while the samples in the upper right corner all representpersons that are not diagnosed angina pectoris. Thus, it is clearlypossible to separate diseased from healthy persons based on thefluorescence data alone.

[0308] It is indeed a significant finding, that the fluorescence data soclearly separate the two groups. Importantly, it is not merely theintensity of fluorescence that separates the patients. Differences inintensities are normally reflected in the first principal component ofspectral data. In this case, however, the second component is alsoimportant for obtaining separation. In fact, the third component—notshown—also helps in obtaining further separation. This result indicates,that more subtle spectral components can be increasingly helpful in thediscrimination between the groups.

[0309] As a larger data set becomes available the procedure outlinedhere will easily be formalised into a classification model that canidentify persons with cardiac problems within a population of otherwisehealthy individuals.

Example 5

[0310] Fluorescence Measurements of Urine Samples with Added Bacteria

[0311] The purpose of the example was to investigate if fluorescencespectra measured directly on urine samples correlate with differentadded levels of bacteria in the urine.

[0312] Samples

[0313] Seven urine samples spiked with 10² to 10⁸ E. coli bacteria pr.ml and a control sample with no bacteria added. The eight samples weredelivered by Alice Friis-Møller, Hvidovre Hospital and kept in a freezeruntil measurement.

[0314] Measurements

[0315] The samples were measured at front face at room temperature in a2 mm cuvette on a Perkin Elmer LS50B (Copenhagen University). Theexcitation wavelength interval range was 230-400 nm (10 nm steps) andthe excitation and emission slits were 4 nm and 3 nm, respectively. Thescan rate was 1000 nm/min. The data were imported to Matlab using every5^(th) emission wavelength giving a step of 2.5 nm in the emissionscans. An important note is that the samples were measured in a sequencecorresponding to the increase in bacteria content.

[0316] Results

[0317] Raw Data

[0318] The raw unfolded data are averaged with a factor 10 over thevariables so the matrix dimensions become 8 samples×384 variables. Thiscorresponds to circa 25 nm steps in the emission scans. FIG. 9.

[0319] PCA

[0320] A PCA is performed on both the mean centered and the auto scaledunfolded spectral data. Variables with standard deviation of 0 andvariables including missing values (NaNs) are excluded. In FIGS. 10 and11 the score plots from these models are shown. No large differences areseen between the auto scaled and the mean centered models.

[0321] PC1 clearly reflects the increase in bacteria content.

[0322] PLS Models

[0323] Mean centered PLS models with fluorescence spectra as theindependent variables and the logarithm of the bacteria content (10² to10⁸) as the dependent variable are developed.

[0324] It is observed that there is a strongly non-linear relationshipbetween the spectra and the number of bacteria cells, but the functionis monotone. (FIG. 12)

[0325] Conclusions

[0326] There seems to be a non-linear relationship between the spectraand the number of bacteria cells. This can be corrected for by somenon-linear transformation of e.g. the dependent variable. Furthermore,it is important to note that the measurement order is crucial and shouldbe randomised in a follow up study.

Example 6

[0327] Basic Fluorescence Measurements of Blood Plasma

[0328] The purpose of the example was to investigate the effect ofdilution and pH on the fluorescence spectra measured on blood plasma.Both transmission and front face sample presentations were investigated.

[0329] Samples

[0330] A 10 ml pool of blood plasma samples from Hvidovre Hospital wasproduced. Samples were taken from this pool and diluted with 0.9%sterile NaCl, and the following dilutions were performed for front face:1:2 3.5 ml pool + 3.5 ml NaCl (A) 1:5 2.0 ml A + 3.0 ml NaCl (B) 1:101.0 ml A + 4.0 ml NaCl (C) 1:25 2.0 ml C + 3.0 ml NaCl 1:50 0.5 ml B +4.5 ml NaCl 1:100 1.0 ml pool + 99.0 ml NaCl (F) 1:200 2.5 ml F + 2.5 mlNaCl 1:500 1.0 ml F + 4.0 ml NaCl 1:700 1.0 ml F + 6.0 ml NaCl 1:20000.25 ml F + 4.75 ml NaCl 1:3000 0.20 ml F + 5.8 ml NaCl 1:5000 0.10 mlF + 4.9 ml NaCl and for trans- mission: 1:2 2.8 ml pool + 2.8 ml NaCl(A) 1:5 2.0 ml A + 3.0 ml NaCl (B) 1:10 1.0 ml A + 4.0 ml NaCl (C) 1:252.0 ml C + 3.0 ml NaCl 1:50 0.5 ml B + 4.5 ml NaCl 1:100 0.25 ml pool +24.75 ml NaCl (F) 1:200 2.5 ml F + 2.5 ml NaCl 1:500 1.0 ml F + 4.0 mlNaCl 1:700 1.0 ml F + 6.0 ml NaCl 1:1000 0.3 ml F + 2.7 ml NaCl 1:20000.25 ml F + 4.75 ml NaCl 1:3000 0.20 ml F + 5.8 ml NaCl 1:5000 0.10 mlF + 4.9 ml NaCl

[0331] Buffers were produced as follows:

[0332] pH circa 9: 0.1 M NaH₂PO₄ (1.785 g NaH₂PO₄ to 100 ml H₂O).

[0333] pH circa 4: 0.1 M Na₂HPO₄ (1.382 g Na₂HPO₄ to 100 ml H₂O).

[0334] pH circa 7: 1.379 g NaH₂PO₄+1.787 g Na₂HPO₄ to 100 ml H₂O.

[0335] pH circa 1: 0.1 M HCl (0.81 ml concentrated HCl to 100 ml H₂O).

[0336] and dilutions for the buffer experiment were: 1:2 2.0 ml pool +2.0 ml buffer 1:10 0.4 ml pool + 3.6 ml buffer (B) 1:200 0.2 ml B + 3.8ml buffer 1:1000 0.1 ml B + 9.9 ml buffer

[0337] All possible combinations of pH levels and dilutions weremeasured resulting in 16 (4×4) spectral landscapes. Both front face andtransmission were tested.

[0338] Measurements

[0339] The samples were defrosted and measured at room temperature. Thesamples were measured in a standard 10×10 mm cuvette on a Perkin ElmerLS50B. The excitation wavelength interval range was 230-400 nm (10 nmsteps) and the excitation and emission slits were 4 nm and 3 nm,respectively. The scan rate was 1000 nm/min. The data were imported toMatlab using every 5^(th) emission wavelength giving a step of 2.5 nm inthe emission scans.

[0340] Results

[0341] Experiment 1: Dilution of Blood Samples Measured in Front FaceMode

[0342] Raw Data

[0343] In FIG. 13 examples of front face fluorescence spectra of anundiluted sample and the same sample diluted 1:5000 are shown. Verydifferent spectral signals both with respect to intensity and shape areobtained for the two samples.

[0344] At low excitation wavelengths the signal intensities at firstincrease with dilution (probably due to quenching) followed by adecrease in signal intensity by further dilution. At high excitationwavelengths a (almost linear) decrease in signal intensity is seen withdilution. This is illustrated in FIGS. 14 and 15.

[0345] PCA

[0346] A PCA is performed on the mean centered unfolded spectral data.Variables with standard deviation of 0 and variables including missingvalues (NaNs) are excluded. Systematic score patterns are seen for up to5 to 6 PCs and FIG. 16 shows the scores for 1 to 4 PCs. Varianceexplained is 79.38%, 16.32%, 2.60%, 1.45% and 0.24% for the first 5 PCs,respectively.

[0347] Experiment 2: Dilution of Blood Samples Measured in TransmissionMode

[0348] Raw Data

[0349] In FIG. 17 examples of transmission fluorescence spectra of anundiluted sample and the same sample diluted 1:5000 are shown. Again,very different spectral signals both with respect to intensity and shapeare obtained for the two samples.

[0350] At low excitation wavelengths the signal intensities at firstincrease with dilution (probably due to quenching) followed by adecrease in signal intensity by further dilution. At high excitationwavelengths a (almost linear) decrease in signal intensity is seen withdilution. This is illustrated in FIGS. 18 and 19.

[0351] PCA

[0352] A PCA is performed on the mean centered unfolded spectral data.Variables with standard deviation of 0 and variables including missingvalues (NaNs) are excluded. Systematic score patterns are seen for up to5 to 6 PCs as for front face mode and FIG. 20 shows the scores for 1 to4 PCs. Variance explained is 85.43%, 11.33%, 1.91%, 0.99% and 0.30% forthe first 5 PCs, respectively.

[0353] PLS Models

[0354] Mean centered PLS models with fluorescence spectra as theindependent variables and the dilution factor as the dependent variableare developed for both front face and transmission mode measurements.

[0355] It is observed that there is a linear relationship from dilutionfactor 1:25 to 1:5000 for front face and for 1:50 to 1:5000. It seemsthat the relationship is non-linear below these dilutions factors. PLSmodelling was also tested with auto scaled data to give the highexcitation wavelengths more influence. Equal results were obtainedalthough the linear range of the models could be expanded to approx.1:10. PLS modelling was also tested with log(dilution factor) and nicelinear models over the whole range were obtained. Only the undilutedsample seemed to deviate a little.

[0356] Experiment 3: Effect of pH & Dilution on the Measured Spectra

[0357] PCA

[0358] PCA is performed on each of the data sets recorded intransmission and front face mode. A score plot from a PCA on all samplesis shown in FIG. 23.

[0359] No huge differences are seen with respect to pH levels, see alsoFIG. 24.

[0360] Conclusions

[0361] It is important to measure at least two (or even better three orfour) different dilutions of the blood plasma samples: the undilutedsample and the same sample diluted 1:2 in front face and diluted1:200/1:100 in transmission. Not surprisingly, front face intensitiesare higher for samples measured at no or low dilution factors, while theopposite holds for samples with high dilution factors. Note, that it ispossible to measure in transmission mode on the undiluted sample. Thetested pH levels do not seem have large effects on the measured spectralshapes or intensities.

[0362] Reference List

[0363] 1. Alsberg B K, Goodacre R, Rowland J J, Kell D B, Classificationof pyrolysis mass spectra by fuzzy multivariate ruleinduction-comparison with regression, k-nearest neighbour, neural anddecision-tree methods, Analytica Chimica Acta, 1997, 348, 389407.

[0364] 2. Bartholomew D J, The foundation of factor analysis,Biometrika, 1984, 71, 221-232.

[0365] 3. Björkström A, Sundberg R, A generalized view on continuumregression, Scandinavian Journal of Statistics, 1999, 26, 17-30.

[0366] 4. Bro R. PARAFAC. Tutorial and applications, Chemom Intell LabSyst, 1997, 38, 149-171.

[0367] 5. Bro R, Multiway calibration. Multi-linear PLS, Journal ofChemometrics, 1996, 10, 47-61.

[0368] 6. Bro R, Multi-way Analysis in the Food Industry. Models,Algorithms, and Applications. Ph.D. thesis, University of Amsterdam(NL), 1998,

[0369] 7. Bro R, Exploratory study of sugar production usingfluorescence spectroscopy and multi-way analysis, Chemom Intell LabSyst, 1999, 46, 133-147.

[0370] 8. Cheng B, Titterington D M, Neural Networks: A Review from aStatistical Perspective, Statistical Science, 1994, 9, 2-54.

[0371] 9. de Jong S, Kiers H A L, Principal covariates regression.Part 1. Theory, Chemom Intell Lab Syst, 1992, 14, 155-164.

[0372] 10. Esbensen K, Wold S, SIMCA, MACUP, SELPLS, GDAM, SPACE &UNFOLD: The way towards regionalized principal components analysis andsubconstrained N-way decomposition—with geological illustrations, ProcNord Symp Appl Statist, Stavanger, 1983,

[0373] 11. Faber N M, Buydens L M C, Kateman G, Generalized rankannihilation method. I: Derivation of eigenvalue problems, Journal ofChemometrics, 1994, 8, 147-154.

[0374] 12. Golub G H, Hansen P C, O'leary D, Tikhonov Regularization andTotal Least Squares, SIAM Journal of Numerical Analysis, 1999, 21,185-194.

[0375] 13. Grung B, Manne R, Missing values in principal componentanalysis, Chemom Intell Lab Syst, 1998, 42, 125-139

[0376] 14. Indahl U G, Sahni N S, Kirkhus B. Naes T, Multivariatestrategies for classification based on NIR—spectra—with application tomayonnaise, Chemom Intell Lab Syst, 1999, 49, 19-31.

[0377] 15. Jackson J, A Users Guide to Principal Components. Wiley &Sons, New York, 1991

[0378] 16. Kohonen T, Self-organized formation of topologically correctfeature maps, Biological Cybernetics, 1982, 43, 59-69.

[0379] 17. Kruskal J B, Harshman R A, Lundy M E, Some relationshipsbetween Tucker's three-mode factor analysis and PARAFAC/CANDECOMP.1983,

[0380] 18. Martens H, Nees T. Multivariate calibration. John Wiley &Sons, Chichester, 1989,Martens H, Naes T, Multivariate calibration bydata compression, Near Infrared Technology in the Agricultural and FoodIndustries, (Eds. Williams,P and Norris,K), The american association ofcereal chemists, Inc., St. Paul, 1987, 57-87.

[0381] 19. Naes T, lsaksson T. Some modifications of locally weightedregression (LWR), NIR news, 1994, 5, 8-9.

[0382] 20. 16. Rajko R, Treatment of model error in calibration byrobust and fuzzy procedures, Analytical Letters, 1994, 27, 215-228.

[0383] 21. 17. Wold S, Dunn W J, III, Multivariate quantitativestructure-activity relationships (QSAR): conditions for theirapplicability, J Chem lnf Comput Sci, 1983, 23, 6-13.

[0384] 22. Wold S, Cross-validatory estimation of the number ofcomponents in factor and principal components models, Technometrics,1978, 20, 397-405.

[0385] 23. Wold S, Albano C, Dunn W J, III, Edlund U, Esbensen K H,Geladi P, Hellberg S, Johansson E, Lindberg W, Sjöström M, Multivariatedata analysis in chemistry, Chemometrics. Mathematics and Statistics inChemistry, (Ed. Kowalski,B R), D. Reidel Publishing Company, Dordrecht,1984, 17-95.

1. A method of training a classification system for characterising abiological sample with respect to at least one condition, comprising a)obtaining a biological sample from an animal, including a human, whereinsaid biological sample is selected from body fluids and/or tissue,wherein the tissue sample is not associated with said condition(s), b)obtaining characterisation information related to each biologicalsample, c) exposing the sample to excitation light within apredetermined range of wavelength, d) determining physical parameter(s)of light emitted from the sample, e) repeating step a) to d) until thephysical parameters of all training samples have been determined, f)optionally performing a data handling of the obtained physicalparameters obtaining data variables, g) optionally performing amultivariate data analysis of the data variables and optionally ofcharacterisation information obtaining model parameters describing thevariation of the data variables, h) classifying the biological samplesinto at least two different classes correlated to the characterisationinformation, obtaining a trained classification system.
 2. The methodaccording to claim 1, whereby step g) further comprises selection oflatent variables being weighted averages of data variables.
 3. Themethod according to claim 1 wherein the biological sample is selectedfrom blood, serum, plasma, saliva, urine, milk, cerebrospinal fluid,tears, nasal secrete, semen, bile, lymph, sweat and/or faeces.
 4. Themethod according to claim 1, wherein the biological sample is a tissuesample.
 5. The method according to claim 4, wherein the tissue sample isa biopsy of tissue selected from muscle, cutis, subcutis, kidney, brain,and liver or a sample of hair or nails.
 6. The method according to claim3, wherein the biological sample is urine, milk, blood, plasma or serum.7. The method according to claim 1, wherein the wavelength of theexcitation light is in the range of from 100 nm to 1000 nm, such as from100 to 800 nm.
 8. The method according to claim 7, wherein thewavelength of the excitation light is in the range of from 200 nm to 800nm, such as from 200 nm to 600 nm.
 9. The method according to claim 1,wherein the physical parameter determined is selected from fluorescenceintensity, fluorescence lifetime, phosphorescence intensity,phosphorescence lifetime, polarisation, polarisation lifetime,anisotropy, anisotropy lifetime, phase-resolved emission, circularlypolarised fluorescence, fluorescence-detected circular dichroism, andany time dependence of the two last mentioned parameters.
 10. The methodaccording to claim 1, wherein the spectral distribution of light emittedranging from 200 nm to 800 nm is generated.
 11. The method according toclaim 2, wherein the ratio of number of training samples to the expectednumber of latent variables is at least 5:1, preferably at least 10:1.12. The method according to claim 1, wherein the multivariate dataanalysis is selected from: Principal component analysis, principalcomponent regression, factor analysis, partial least squares, fuzzyclustering, artificial neural networks, parallel factor analysis, Tuckermodels, generalised rank annihilation method, locally weightedregression, ridge regression, total least squares, principal covariatesregression, Kohonen networks, linear or quadratic discriminant analysis,k-nearest neighbours based on rank-reduced distances, multilinearregression methods, soft independent modelling of class analogies,robustified versions of the above and/or obvious non-linear versionssuch as one obtained by allowing for interactions or crossproducts ofvariables, exponential transformations etc.
 13. The method according toclaim 1, wherein the data handling of step f) is selected from a one-waymatrix of spectral information, a two-way matrix of spectralinformation, a three-way matrix of spectral information, a four-waymatrix of spectral information and, a five-way or higher order matrix ofspectral information.
 14. The method according to claim 1, wherein othervariable(s) is included in the multivariate analysis of step g).
 15. Themethod according to claim 14, wherein the other variable(s) is selectedfrom a pH value of the sample, concentration of various electrolytes inthe sample, concentration of any other relevant compound in the sample,temperature, chemical parameters or any other physical property of thesample.
 16. The method according to claim 1, wherein other variable(s)related to the animal, including a human being, is included in themultivariate analysis of step g)
 17. The method according to claim 16,wherein the other variable(s) is selected from any parameter relating tothe bodily or mental condition, hair colour, skin colour, age, sex,geographic origin, affiliation, hereditary background, stress level,medical diagnosis, subjective evaluations or clinical parameters. 18.The method according to claim 1, wherein the sample is pre-treatedbefore subjecting the sample to step c).
 19. The method according toclaim 18, wherein the pre-treatment comprises adjustment of pH of thesample to a predetermined value.
 20. The method according to claim 1,wherein a classification system for diagnostic purposes with relation toheart diseases is obtained.
 21. The method according to claim 1, whereina classification system for diagnostic purposes with relation to abuseof medicine or narcotics is obtained.
 22. A diagnostic classificationsystem comprising a) a sample domain for comprising a biological sample,b) light means for exposing the sample to excitation light in the sampledomain, c) a detecting means recording the physical parameter(s) oflight emitted from the sample, d) optionally computing means forperforming data handling of the physical parameters, obtaining datavariables, e) optionally processing means for providing model parametersfrom data variables of the sample, f) at least one storage means forstoring physical parameters and/or data variables and/or modelparameters of the biological sample, g) at least one storage means forstoring physical parameters and/or data variables and/or modelparameters and characterisation information of a trained classificationsystem, h) means for correlating physical parameters and/or datavariables and/or model parameters from the sample with physicalparameters and/or data variables and/or model parameters of the trainedsystem, and i) means for displaying the characterisation class(es) of asample.
 23. The system according to claim 22, wherein the modelparameters are latent variables being weighted averages of the datavariables.
 24. The system according to claim 22, wherein the biologicalsample is a liquid sample, such as a sample selected from blood, serum,saliva, milk, urine, cerebrospinal fluid, tears, nasal secrete, semen,bile, lymph, sweat and/or faeces.
 25. The system according to claim 22,wherein the biological sample is a tissue sample.
 26. The systemaccording to claim 25, wherein the tissue sample is a biopsy of tissueselected from muscle, cuffs, subcutis, kidney, brain, and liver or asample of hair or nails.
 27. The system according to claim 24, whereinthe biological sample is urine, milk, blood, plasma or serum.
 28. Thesystem according to claim 22, wherein the light means is arranged toemit light having a wavelength in the range of from 100 nm to 1000 nm,such as from 100 to 800 nm.
 29. The system according to claim 28,wherein the light means is arranged to emit light having a wavelength inthe range of from 200 nm to 800 nm, such as from 200 nm to 600 nm. 30.The system according to claim 22, wherein the physical parameterdetermined is selected from fluorescence intensity, fluorescencelifetime, phosphorescence intensity, phosphorescence lifetime,polarisation, polarization lifetime, anisotropy, anisotropy lifetime,phase-resolved emission, circularly polarised fluorescence,fluorescence-detected circular dichroism, and any time dependence of thetwo last mentioned parameters.
 31. The system according to claim 22,wherein the detecting means is selected from a photomultiplier, ascanning camera, for example a vidicon, a CCD camera, a CMOS, or a diodearray.
 32. The system according to claim 22, being divided into at leasta first unit and a second unit, wherein said first unit comprises theparts a) to at least c) of the system, and the second unit comprises theother parts.
 33. The system according to claim 22, further includingmeans for measuring other variable(s) of the sample.
 34. The systemaccording to claim 33, wherein the other variable(s) is selected from apH value of the sample, concentration of various electrolytes in thesample, concentration of any other relevant compound in the sample,temperature, chemical parameters or any other physical property of thesample.
 35. The system according to claim 22, further including meansfor entering other variables.
 36. The system according to claim 35,wherein the other variable(s) is selected from any parameter relating tothe bodily or mental condition, hair colour, skin colour, age, sex,geographic origin, affiliation, hereditary background, stress level,medical diagnosis, subjective evaluations or clinical parameters. 37.The system according to claim 22, wherein the sample is pre-treatedbefore subjecting the sample to step b).
 38. The system according toclaim 37, wherein the pretreatment comprises adjustment of pH of thesample to a predetermined value.
 39. The system according to claim 22,being a classification system for diagnostic purposes with relation toheart diseases.
 40. The system according to claim 22, being aclassification system for diagnostic purposes with relation to abuse ofmedicine or narcotics.
 41. A method for characterising a biologicalsample of an animal, including a human, comprising a) obtaining abiological sample from the animal or human, b) exposing the sample toexcitation light, c) determining the physical parameter(s) of lightemitted from the sample, d) optionally performing a data handling of theobtained physical parameters obtaining data variables, e) storing thephysical parameters and/or data variables and/or model parameters, f)optionally providing model parameters from data variables of the sample,g) obtaining physical parameters and/or data variables and/or modelparameters from a trained classification system, h) correlating physicalparameters and/or data variables and/or model parameters from the samplewith physical parameters and/or data variables and/or model parametersof the trained system, and i) displaying characterisation class(es) ofthe sample.
 42. The method according to claim 41, wherein the modelparameters are latent variables being weighted averages of the datavariables.
 43. The method according to claim 41, wherein the biologicalsample is selected from blood, serum, plasma, saliva, urine,cerebrospinal fluid, tears, nasal secrete, semen, milk, bile, lymph,sweat and/or faeces.
 44. The method according to claim 41, wherein thebiological sample is a tissue sample.
 45. The method according to claim41, wherein the tissue sample is a biopsy of tissue selected frommuscle, cutis, subcutis, kidney, brain, and liver.
 46. The methodaccording to claim 43, wherein the biological sample is urine, blood,milk, or serum.
 47. The method according to claim 41, wherein thewavelength of the excitation light is in the range of from 100 nm to1000 nm, such as from 100 to 800 nm.
 48. The method according to claim41, wherein the wavelength of the excitation light is in the range offrom 200 nm to 800 nm, such as from 200 nm to 600 nm.
 49. The methodaccording to claim 41, wherein the physical parameter determined isselected from fluorescence intensity, fluorescence lifetime,phosphorescence intensity, phosphorescence lifetime, polarisation,polarisation lifetime, anisotropy, anisotropy lifetime, phase-resolvedemission, circularly polarised fluorescence, fluorescence-detectedcircular dichroism, and any time dependence of the tho last mentionedparameters.
 50. The method according to claim 41, wherein the spectraldistribution of light emitted ranging from 200 nm to 800 nm is generated51. The method according to claim 41, wherein the data handling of stepd) is selected from a one-way matrix of spectral information, a two-waymatrix of spectral information, a three-way matrix of spectralinformation, a four-way matrix of spectral information and, a five-wayor higher order matrix of spectral information.
 52. The method accordingto claim 41, wherein other variable(s) is included as data variables.53. The method according to claim 52, wherein the other variable(s) isselected from a pH value of the sample, concentration of variouselectrolytes in the sample, concentration of any other relevant compoundin the sample, temperature, chemical parameters or any other physicalproperty of the sample.
 54. The method according to claim 41, whereinother variable(s) related to the animal, including a human being, isincluded as data variables
 55. The method according to claim 54, whereinthe other variable(s) is selected from any parameter relating to thebodily or mental condition, hair colour, skin colour, age, sex,geographic origin, affiliation, hereditary background, stress level,medical diagnosis, subjective evaluations or clinical parameters. 56.The method according to claim 41, wherein the sample is pre-treatedbefore subjecting the sample to step b).
 57. The method according toclaim 56, wherein the pre-treatment comprises adjustment of pH of thesample to a predetermined value.
 58. The method according to claim 41,wherein the trained classification system is a diagnostic heart diseaseclassification system.
 59. The method according to claim 41, wherein thetrained classification system is a diagnostic abuse classificationsystem related to abuse of medicine or narcotics.